This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
## [1] "/Users/maxalekhnovich/Downloads"
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
## [1] 13
General Distributions
The graph on the left shows the distribution of fixed acidity. The median is approximately eight and there is a singificant number of outliers both above and below the mean. The bar graph on the right shows the same distribution just in another format that
#similar graph to the one above but with volatile acidity
grid.arrange(ggplot(data = redInfo,aes(x = 1,
y = redInfo$volatile.acidity))+
ylab("Volatile Acidity levels")+
ggtitle("Volatile Acidity Distribution")+
geom_jitter(alpha = 0.1)+
geom_boxplot(alpha = 0.2, color = "red"),
ggplot(data = redInfo, aes(x = redInfo$volatile.acidity))+
xlab("volatile acid levels")+
geom_histogram(bins=30),ncol=2)
#chloride distribution
grid.arrange(ggplot(data = redInfo,aes(x = 1,
y = redInfo$chlorides))+
ylab("chloride Quantity")+
ggtitle("Chloride Distribution")+
geom_jitter(alpha = 0.1)+
geom_boxplot(alpha = 0.2, color = "blue"),
ggplot(data = redInfo, aes(x = redInfo$chlorides))+
xlab("chloride levels")+xlim(0,0.25)+
geom_histogram(bins=30),ncol=2)
## Warning: Removed 25 rows containing non-finite values (stat_bin).
Single Variable analysis of volatile acidity and chloride distribution
## NULL
## X fixed.acidity volatile.acidity citric.acid
## Min. : 2.0 Min. : 6.500 Min. :0.1800 Min. :0.0000
## 1st Qu.: 380.5 1st Qu.: 8.400 1st Qu.:0.3700 1st Qu.:0.2500
## Median : 597.5 Median : 9.900 Median :0.4600 Median :0.4400
## Mean : 691.5 Mean : 9.914 Mean :0.4847 Mean :0.4085
## 3rd Qu.:1059.2 3rd Qu.:11.200 3rd Qu.:0.5900 3rd Qu.:0.5300
## Max. :1562.0 Max. :15.900 Max. :1.2400 Max. :1.0000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.0440 Min. : 3.00
## 1st Qu.: 1.900 1st Qu.:0.0730 1st Qu.: 6.00
## Median : 2.200 Median :0.0840 Median :12.00
## Mean : 2.699 Mean :0.1052 Mean :14.87
## 3rd Qu.: 2.700 3rd Qu.:0.1000 3rd Qu.:20.00
## Max. :15.500 Max. :0.6110 Max. :55.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.4000
## 1st Qu.: 20.00 1st Qu.:0.9964 1st Qu.:3.090 1st Qu.:0.5600
## Median : 38.00 Median :0.9974 Median :3.150 Median :0.6500
## Mean : 48.66 Mean :0.9976 Mean :3.125 Mean :0.7076
## 3rd Qu.: 66.00 3rd Qu.:0.9988 3rd Qu.:3.180 3rd Qu.:0.7925
## Max. :289.00 Max. :1.0037 Max. :3.210 Max. :2.0000
## alcohol quality
## Min. : 8.4 Min. :3.000
## 1st Qu.: 9.4 1st Qu.:5.000
## Median : 9.9 Median :6.000
## Mean :10.2 Mean :5.682
## 3rd Qu.:10.9 3rd Qu.:6.000
## Max. :14.9 Max. :8.000
## X fixed.acidity volatile.acidity citric.acid
## Min. : 3.0 Min. : 6.000 Min. :0.180 Min. :0.0000
## 1st Qu.: 429.8 1st Qu.: 7.725 1st Qu.:0.370 1st Qu.:0.2100
## Median : 810.5 Median : 8.300 Median :0.500 Median :0.3100
## Mean : 811.2 Mean : 8.644 Mean :0.511 Mean :0.3074
## 3rd Qu.:1188.5 3rd Qu.: 9.400 3rd Qu.:0.630 3rd Qu.:0.4100
## Max. :1590.0 Max. :13.000 Max. :1.070 Max. :0.7600
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 1.200 Min. :0.01200 Min. : 1.00
## 1st Qu.: 2.000 1st Qu.:0.07200 1st Qu.: 7.00
## Median : 2.200 Median :0.08000 Median :12.50
## Mean : 2.568 Mean :0.08600 Mean :15.28
## 3rd Qu.: 2.775 3rd Qu.:0.09275 3rd Qu.:21.00
## Max. :12.900 Max. :0.35800 Max. :66.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9906 Min. :3.220 Min. :0.370
## 1st Qu.: 21.00 1st Qu.:0.9960 1st Qu.:3.243 1st Qu.:0.550
## Median : 38.00 Median :0.9969 Median :3.270 Median :0.620
## Mean : 49.57 Mean :0.9969 Mean :3.269 Mean :0.648
## 3rd Qu.: 69.00 3rd Qu.:0.9979 3rd Qu.:3.290 3rd Qu.:0.720
## Max. :165.00 Max. :1.0029 Max. :3.310 Max. :1.560
## alcohol quality
## Min. : 9.00 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.00 Median :6.000
## Mean :10.37 Mean :5.663
## 3rd Qu.:11.07 3rd Qu.:6.000
## Max. :13.40 Max. :8.000
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.600 Min. :0.12 Min. :0.0000
## 1st Qu.: 406.0 1st Qu.: 6.700 1st Qu.:0.43 1st Qu.:0.0300
## Median : 892.0 Median : 7.200 Median :0.56 Median :0.1300
## Mean : 853.5 Mean : 7.284 Mean :0.56 Mean :0.1772
## 3rd Qu.:1285.0 3rd Qu.: 7.800 3rd Qu.:0.66 3rd Qu.:0.3000
## Max. :1599.0 Max. :11.600 Max. :1.58 Max. :0.7800
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 1.200 Min. :0.03400 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.06700 1st Qu.: 9.00
## Median : 2.100 Median :0.07700 Median :15.00
## Mean : 2.436 Mean :0.07852 Mean :16.73
## 3rd Qu.: 2.500 3rd Qu.:0.08500 3rd Qu.:22.00
## Max. :13.900 Max. :0.26700 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 7.00 Min. :0.9902 Min. :3.320 Min. :0.3300
## 1st Qu.: 24.00 1st Qu.:0.9952 1st Qu.:3.360 1st Qu.:0.5500
## Median : 38.00 Median :0.9962 Median :3.400 Median :0.6100
## Mean : 43.68 Mean :0.9962 Mean :3.434 Mean :0.6363
## 3rd Qu.: 58.00 3rd Qu.:0.9973 3rd Qu.:3.480 3rd Qu.:0.7100
## Max. :160.00 Max. :1.0026 Max. :4.010 Max. :1.1600
## alcohol quality
## Min. : 8.70 Min. :3.000
## 1st Qu.: 9.70 1st Qu.:5.000
## Median :10.40 Median :6.000
## Mean :10.57 Mean :5.597
## 3rd Qu.:11.20 3rd Qu.:6.000
## Max. :14.00 Max. :8.000
Fixed Acididty levels for wines of the highest quality have a specific range between 6 and eight. This is not true of wines of lesser qualities as they can have a wider range of fixed acidity. The first graph shows that the majority of wines with a high level of pH have very little to no citric acid and also the majority have a total sulfer dioxide level under 80.White wines with medium pH are scattered. Wines with low pH have tend to have less sulfer dioxide than medium pH wines and also have a higher concentration of citric acid levels from 0.35 to 0.6 based on the alpha value of 1/10.
The second graph shows that there is not a significant relationship between pH and quality of wines as wines of the highest quality appear in all three graphs. The citric acid quanitity for high/medium pH wines is lower than the citric acid level for low pH wines with few outliers.
The first graph shows that there is a positive relationship between alcohol and residual sugar. In low quality white wines(3&4), there is a significant rise in the alchol and residual sugars. The average quality wines don’t have a significant increase in residual sugar levels, but do have a more varied level of alcohol. There is a slight increase in alcohol and residual sugar levels for the wines of the highest quality.
#lets get some stats on the density variable
densitySummary = summary(redInfo$density)
#lets get some stats on the total sulfer dioxide variable
totalSulferDSummary = summary(redInfo$total.sulfur.dioxide)
densitySummary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0037
totalSulferDSummary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
#lets examine density vs sulfer dioxide
#lets make the colors stand out
plot1 = ggplot(data = redInfo, aes(x = redInfo$density,
y = redInfo$total.sulfur.dioxide,
color = redInfo$quality))+
scale_color_continuous(low = "blue",high = "red")+geom_point()+
labs(title =
"Density VS Total Sulfer Dioxide",
x = "Alcohol Level",
y = "Residual Sugar Level")
#lets try the same graph with a facet wrap of quality rather than the color
#will also change the color variable to represent the free sulfer dioxide
ggplot(data = redInfo, aes(x = redInfo$density, y = redInfo$total.sulfur.dioxide,
color = redInfo$free.sulfur.dioxide))+
scale_color_continuous(low = "blue",high = "red")+
geom_point()+facet_wrap(~redInfo$quality)+
labs(title = "Density VS Total Sulfer Dioxide",
subtitle = "Alcohol",
x = "Density Level",
y = "Total Sulfer Dioxide")
#want summary of citric acid and residual sugar before graphing
citricAcidSummary = summary(redInfo$citric.acid)
alcoholSummary = summary(redInfo$alcohol)
residualSugarSummary = summary(redInfo$residual.sugar)
citricAcidSummary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
residualSugarSummary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
alcoholSummary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
#lets try a different combination with the same color scheme, citric acid vs residual sugar, and color equaling alcohol
ggplot(data = redInfo, aes(x = redInfo$citric.acid,
y = redInfo$residual.sugar,
color = redInfo$alcohol))+
scale_color_continuous(low = "black",high = "yellow")+
geom_point()+facet_wrap(~redInfo$quality)+
labs(title = "Citric Acid VS Residual Sugar",
x = "Citric Acid Level",
y = "Residual Sugar Level")
#graph demonstrates that the quality of the wine is in a particular range in both residual sugar as well as alcohol
#really poor quality wines also have either extremely low citric acid or alot of citric acid based on the graph
The 1st graph illustrates that wines of higher quality tend to have more free sulfer dioxide based on the legend on the right. Also, wines of the highest quality have a density level slightly above or below 0.995 and a total sulfer dioxide level less than 100, with the majority being uder 50. The same can be said about wines with the lowest quality but those wines also have a much lower free sulfer dioxide level.
The 2nd graph illustrates the citric acid level vs the residual sugar level of wines of varying qualities. The lowest quality wines mostly have little to no citric acid and a residual sugar level around 2. The average quality wines have a higher quality of alcohol and a wider range of residual sugar levels. Wines of quality 7&8, the alcohol level is close to the median alcohol level of 10.20.
#lets try using a facet wrap of a different variable such as residual sugar
badplot=ggplot(data = redInfo, aes(x = redInfo$density, y = redInfo$total.sulfur.dioxide))+facet_wrap(~redInfo$residual.sugar)
#obviously thats not gonna work because there is too many residual sugar unique values
#lets try again by factoring the residual sugar values based on the four quartiles
residualSugarSummary=summary(redInfo$residual.sugar)
residualSugarSummary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
redInfo$quality <- factor(redInfo$quality,
labels = c(3, 4, 5, 6, 7,8))
ggplot(redInfo, aes(x =redInfo$quality,
y =redInfo$residual.sugar))+
labs(title = "Quality VS Residual Sugar",
x = "Quality",
y = "Residual Sugar Level")+
geom_boxplot(fill = 'purple', colour = 'orange', alpha = 0.7)+
scale_y_continuous(name = 'residual sugar', breaks = seq(2,10,.5))
#interesting to note that there are very fewer outliers below Q1
#also after adjusting the breaks, it is easy to see that wines of a higher quality on average have a residual sugar between 2.5 and 3
#lets try total sulfer dioxide vs quality in a similar boxplot
#lets try adding an xlim as well to remove outliers
ggplot(redInfo, aes(x =redInfo$quality,
y =redInfo$total.sulfur.dioxide))+
geom_boxplot(fill = 'purple', colour = 'yellow', alpha = 0.7)+
labs(title = "Quality VS Total Sulfer Dioxide",
x = "Quality",
y = "Residual Sugar Level")+
scale_y_continuous(name = 'total sulfer dioxide', breaks = seq(0,100,25))
Based on the above graph, it seems that residual sugar alone does not play a significant part in determining the quality of the wine. Analyizing total sulfer dioxide also shows that there isn’t a significant trend between quality and total sulfer dioxide alone.
#want to compare chlorides and alcohol levels while paying attention to the quality
chloridesAlcoholQuality =ggplot(aes(x = redInfo$quality,
y =redInfo$alcohol),
data = redInfo) +
geom_jitter( alpha = .3) +
geom_boxplot( alpha = .5,color = 'blue')+
stat_summary(fun.y = "mean",
geom = "point",
shape = 8,
size = 4)+
labs(title = "Alcohol VS Quality",
x = "Quality",
y = "Alcohol Level")
chloridesAlcoholQuality
There is a clear trend that alcohols of a higher quality have a higher amount of alcohol.Also, this graph shows that most wines have a quality of 5 and 6.
## [1] 3 8
## [1] 2.74 4.01
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
FINAL PLOTS PLOT #1
## [1] 3 8
## [1] 2.74 4.01
PLOT 1 ANALYSIS
The console shows the range of both the quality values and the pH values for the dataset.
pH levels for wines have a negative relationship with quality. As the quality improves, the pH level decreases. The pH levels of wines of quality 3 and 4 is approximately 3.4. Wines of the highest quality have a median pH of 3.25
PLOT2
#wines of a higher quality have a lower median of pH level. There is also a significantly higher pH for those wines that have the lowest quality (2 & 3).
# citric acid vs residual sugar, and color equaling alcohol
citricbyresidual = ggplot(data = redInfo, aes(x = redInfo$citric.acid,
y = redInfo$residual.sugar,
color = redInfo$alcohol))+
scale_color_continuous(low = "black",high = "yellow")+
geom_point()+
facet_wrap(~redInfo$quality)+
xlim(0,0.75)+
ggtitle("Citric acid level vs Residual Sugar based on Quality")+
xlab("residual sugar levels")+
ylab("citric acid levels")+
labs(color = "Alcohol")
citricbyresidual
## Warning: Removed 6 rows containing missing values (geom_point).
#alcohol levels for wines of poor quality are extremely low, wines of quality 3 mostly have a small amount of citric acid and residual sugar. Wines of average quality 5&6 have a larger descrepency in their alcohol levels but the majority have a residual sugar level under 4 and a maximum citric acid level of 0.75. Wines of the highest quality, 7&8, have a low amount of reisudal sugar if their alcohol levels are moderately low. If they have a high alcohol level, they are more likely to also have a higher residual sugar level. The citric acid level of wines of the highest quality is also less than 0.5.
PLOT 2 ANALYSIS
The 2nd graph illustrates the citric acid level vs the residual sugar level of wines of varying qualities. The lowest quality wines mostly have little to no citric acid and a residual sugar level around 2. The average quality wines have a higher quality of alcohol and a wider range of residual sugar levels. Wines of quality 7&8, the alcohol level is close to the median alcohol level of 10.20.
PLOT 3
lowPH <-redInfo[which(redInfo$pH<3.211),]
mediumPH<- redInfo[which((redInfo$pH>=3.211) & (redInfo$pH<=3.31)),]
highPH <- redInfo[which(redInfo$pH>3.31),]
redInfo$quality <- factor(redInfo$quality,
labels = c(3, 4, 5, 6, 7,8))
#lets graph them all on the same plane
#lets try the same with a facet wrap now
#lowPhResidualChlorides
library(gridExtra)
#lets try it for a different arrangement such as citric acid vs tiotal sulfer dioxide
lowPhcitricTotal = ggplot(data = lowPH, aes(x = lowPH$citric.acid,
y = lowPH$total.sulfur.dioxide),
color = redInfo$quality)+
geom_point(alpha = 1/10)+
ylab("total sulfer dioxide")+
xlab("citric acid levels")+
xlim(0,0.75)+
stat_summary(fun.y = "mean",
geom = "point",
color = "red",
shape = 8,
size = 2)+
ggtitle("Low Ph citric acid vs total sulfer dioxide")+scale_y_continuous(breaks = seq(0,160,20))
#medium ph citric acid vs total sulfer dioxide
mediumPhcitricTotal = ggplot(data = mediumPH, aes(x = mediumPH$citric.acid, y = mediumPH$total.sulfur.dioxide))+
geom_point(alpha = 1/10)+
ylab("total sulfer dioxide")+
xlab("citric acid levels")+
xlim(0.0,0.50)+
stat_summary(fun.y = "mean",
geom = "point",
color = "red",
shape = 8,
size = 2)+
ggtitle("Medium Ph citric acid vs total sulfer dioxide")+
scale_y_continuous(breaks = seq(0,160,20))
#getting the range of the three values so i can adjust the breaks accordingly
range(lowPH$total.sulfur.dioxide)
## [1] 6 289
range(mediumPH$total.sulfur.dioxide)
## [1] 6 165
range(highPH$total.sulfur.dioxide)
## [1] 7 160
#getting basic stats for the variables as well
summary(lowPH$total.sulfur.dioxide)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 20.00 38.00 48.66 66.00 289.00
summary(mediumPH$total.sulfur.dioxide)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 21.00 38.00 49.57 69.00 165.00
summary(highPH$total.sulfur.dioxide)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.00 24.00 38.00 43.68 58.00 160.00
highPhcitricTotal = ggplot(data = highPH, aes(x = highPH$citric.acid,
y = highPH$total.sulfur.dioxide))+
geom_point(alpha = 1/10, width =.25)+
ylab("total sulfer dioxide")+
xlim(0,0.50)+
xlab("citric acid levels")+
ggtitle("High Ph citric acid vs total sulfer dioxide")+
stat_summary(fun.y = "mean",
geom = "point",
color = "red",
shape = 8,
size = 2)+
scale_y_continuous(breaks = seq(0,160,20))
## Warning: Ignoring unknown parameters: width
grid.arrange(lowPhcitricTotal,mediumPhcitricTotal,highPhcitricTotal)
## Warning: Removed 4 rows containing non-finite values (stat_summary).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 39 rows containing non-finite values (stat_summary).
## Warning: Removed 39 rows containing missing values (geom_point).
## Warning: Removed 22 rows containing non-finite values (stat_summary).
## Warning: Removed 22 rows containing missing values (geom_point).
#used an alpha jitter to distinguish and avoid overplotting
#When the alcohol has a low pH as defined by having a pH under 3.21 or Quartile 1, it's citric acid is mostly around .5 and an overwhelming amount of the ponts have a total sulfer dioxide under 100. The majority have a citric acid value around .5 and a total sulfer dioxide value under 50. This is based on the darkness of the point.
#When the ph was in between q1 and the median, there was a more varied level of citric acid, but the range of total sulfer dioxide values were less than the low pH.
#when the pH was high, considered to be when the value was greater than the median, there was alarge number who had very low citric acid levels and a large number had total sulfer dioxide values in between 0 and 60.
PLOT 3 ANALYSIS
The first three numbers show the range and the quartiles for the three subsets made which are low pH, medium pH, and large pH. It’s interesting to note that the medium pH wines and the low pH wines have similar q2 and q3 levels, where as the high pH levels have a lower mean and q3 value than both.
The first graph has a higher range for citric acid and that is why I set the Xlimit a little bit higher for that graph. For wines with low pH, there is a significant amount of wines that have citric acid levels in between 0.4 and 0.6 with total sulfer dioxide levels under 50. The second graph demonstrates that medium pH levels can have a wide range of both total sulfer dioxide and citric acid as evident by there being no dark points due to the alpha parameter. The third graph shows that there are many wines with high pHs that have little to no citric acid and a totaol sulfer dioxide under 60. The high pH graph also has almost no slope in comparison to the other two graphs. This shows that when the pH is high, the total sulfer dioxide and citric levels don’t fluctuate too much when they are both trending in the same direction.
CONCLUSION After analyzing the dataset of 1599 wines based on the 13 variables provided I came to some conclusions.To start, the quality of the wine is heavily impacted by the pH levels of the wine. Wines of lower quality tend to have a higher value of pH in comparison to wines of a higher quality. Also, wines of higher quality tend to not have too much alcohol or too little.They also did not have too much residual sugar Wines that are very poor tend to have a very small amount of alcohol. Wines that have a larger amount of alcohol are most often average.I also observed a positive relationship with pH, citric acid, and total sulfer dioxide. The wines that had a high pH had a citric acid value of zero or close to zero. The citric acid for low pH wines were 0.5 and medium pH wines were less distinct and had a much wider range, on average. In conclusion, I learned that wines of higher qualities tend to have a good balance of significant factors such as pH, residual sugar, and alcohol levels. Most often when wines had a significantly high or low value for these specific factors the quality of the wine was not very high. I wished there were more variables that would of analyzed either factors such as price or sales. I think it would have made the project more interesting to examine the different wines at different price points, especially in comparison to the quality of the wines. Also, I had some issues with trying to categorize the data initially with some of the variables considering their ranges were very large. The quality variable was perfect in the sense that it’s range was only between three and eight. I overcame this by making other categorical variables such as “lowPH”, “mediumPH” and “largePH” variables that I could use to look at different aspects of the data as well.